68 research outputs found
Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm
Peer reviewedPublisher PD
Revisiting ancestral polyploidy in plants
Whole-genome duplications (WGDs) or polyploidy events have been studied extensively in plants. In a now widely cited paper, Jiao et al. presented evidence for two ancient, ancestral plant WGDs predating the origin of flowering and seed plants, respectively. This finding was based primarily on a bimodal age distribution of gene duplication events obtained from molecular dating of almost 800 phylogenetic gene trees. We reanalyzed the phylogenomic data of Jiao et al. and found that the strong bimodality of the age distribution may be the result of technical and methodological issues and may hence not be a "true" signal of two WGD events. By using a state-of-the-art molecular dating algorithm, we demonstrate that the reported bimodal age distribution is not robust and should be interpreted with caution. Thus, there exists little evidence for two ancient WGDs in plants from phylogenomic dating
GeneCATānovel webtools that combine BLAST and co-expression analyses
The gene co-expression analysis toolbox (GeneCAT) introduces several novel microarray data analyzing tools. First, the multigene co-expression analysis, combined with co-expressed gene networks, provides a more powerful data mining technique than standard, single-gene co-expression analysis. Second, the high-throughput Map-O-Matic tool matches co-expression pattern of multiple query genes to genes present in user-defined subdatabases, and can therefore be used for gene mapping in forward genetic screens. Third, Rosetta combines co-expression analysis with BLAST and can be used to find ātrueā gene orthologs in the plant model organisms Arabidopsis thaliana and Hordeum vulgare (Barley). GeneCAT is equipped with expression data for the model plant A. thaliana, and first to introduce co-expression mining tools for the monocot Barley. GeneCAT is available at http://genecat.mpg.d
Plant Expression Omnibus
Gene expression matrices and sample annotation files for 103 species of Archaeplastida. Data used to construct expression.plant.tools</p
Fig 1. eFP - Copy.pdf
Anatomy of Brachypodium distachyon organs at different stages of development, as published in Sibout et al., 2017, "Expression atlas and comparative co-expression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon", New Phytologist (in press). The age of sampled organs is given as DAG (days after germination), DAF (days after fertilization), DAH (days after heading), or Years. <div><br></div><div>Artworks by Debbie Maizels, Zoobotanica Scientific Illustration<br></div
Malaria.tools - comparative genomic and transcriptomic database for Plasmodium species
Malaria is a tropical parasitic disease caused by the Plasmodium genus, which resulted in an estimated 219 million cases of malaria and 435Ā 000 malaria-related deaths in 2017. Despite the availability of the Plasmodium falciparum genome since 2002, 74% of the genes remain uncharacterized. To remedy this paucity of functional information, we used transcriptomic data to build gene co-expression networks for two Plasmodium species (P. falciparum and P. berghei), and included genomic data of four other Plasmodium species, P. yoelii, P. knowlesi, P. vivax and P. cynomolgi, as well as two non-Plasmodium species from the Apicomplexa, Toxoplasma gondii and Theileria parva. The genomic and transcriptomic data were incorporated into the resulting database, malaria.tools, which is preloaded with tools that allow the identification and cross-species comparison of co-expressed gene neighbourhoods, clusters and life stage-specific expression, thus providing sophisticated tools to predict gene function. Moreover, we exemplify how the tools can be used to easily identify genes relevant for pathogenicity and various life stages of the malaria parasite. The database is freely available at www.malaria.tools.Accepted versio
Inferring biosynthetic and gene regulatory networks from Artemisia annua RNA sequencing data on a credit card-sized ARM computer
Prediction of gene function and gene regulatory networks is one of the most active topics in bioinformatics. The accumulation of publicly available gene expression data for hundreds of plant species, together with advances in bioinformatical methods and affordable computing, sets ingenuity as one of the major bottlenecks in understanding gene function and regulation. Here, we show how a credit card-sized computer retailing for <50 USD can be used to rapidly predict gene function and infer regulatory networks from RNA sequencing data. To achieve this, we constructed a bioinformatical pipeline that downloads and allows quality-control of RNA sequencing data; and generates a gene co-expression network that can reveal enzymes and transcription factors participating and controlling a given biosynthetic pathway. We exemplify this by first identifying genes and transcription factors involved in the biosynthesis of secondary cell wall in the plant Artemisia annua, the main natural source of the anti-malarial drug artemisinin. Networks were then used to dissect the artemisinin biosynthesis pathway, which suggest potential transcription factors regulating artemisinin biosynthesis. We provide the source code of our pipeline (https://github.com/mutwil/LSTrAP-Lite) and envision that the ubiquity of affordable computing, availability of biological data and increased bioinformatical training of biologists will transform the field of bioinformatics. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.Accepted versio
LSTrAP: efficiently combining RNA sequencing data into co-expression networks
Abstract Background Since experimental elucidation of gene function is often laborious, various in silico methods have been developed to predict gene function of uncharacterized genes. Since functionally related genes are often expressed in the same tissues, conditions and developmental stages (co-expressed), functional annotation of characterized genes can be transferred to co-expressed genes lacking annotation. With genome-wide expression data available, the construction of co-expression networks, where genes are nodes and edges connect significantly co-expressed genes, provides unprecedented opportunities to predict gene function. However, the construction of such networks requires large volumes of high-quality data, multiple processing steps and a considerable amount of computation power. While efficient tools exist to process RNA-Seq data, pipelines which combine them to construct co-expression networks efficiently are currently lacking. Results LSTrAP (Large-Scale Transcriptome Analysis Pipeline), presented here, combines all essential tools to construct co-expression networks based on RNA-Seq data into a single, efficient workflow. By supporting parallel computing on computer cluster infrastructure, processing hundreds of samples becomes feasible as shown here for Arabidopsis thaliana and Sorghum bicolor, which comprised 876 and 215 samples respectively. The former was used here to show how the quality control, included in LSTrAP, can detect spurious or low-quality samples. The latter was used to show how co-expression networks are able to group known photosynthesis genes and imply a role in this process of several, currently uncharacterized, genes. Conclusions LSTrAP combines the most popular and performant methods to construct co-expression networks from RNA-Seq data into a single workflow. This allows large amounts of expression data, required to construct co-expression networks, to be processed efficiently and consistently across hundreds of samples. LSTrAP is implemented in Python 3.4 (or higher) and available under MIT license from https://github.molgen.mpg.de/proost/LSTrA
- ā¦